Back

Statistics in Medicine

17 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials
2026-03-06 epidemiology 10.64898/2026.03.05.26347653
#1 (3.5%)
Show abstract

Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...

2
Federated penalized piecewise exponential model for horizontally distributed survival data: FedPPEM
2026-02-12 health informatics 10.64898/2026.02.11.26346054
Top 0.1% (1.9%)
Show abstract

Cox proportional hazard regressions are frequently employed to develop prognostic models for time-to-event data, considering both patient-specific and disease-specific characteristics. In high-dimensional clinical modeling, these biological features can exhibit high collinearity due to inter-feature relationships, potentially causing instability and numerical issues during estimation without regularization. For rare diseases such as acute myeloid leukemia (AML), the sparsity and scarcity of data...

3
Aging Out of the Blue: Estimating and Calibrating Region-specific Epigenetic Clocks for a Blue Zone via SuperLearner
2026-03-03 epidemiology 10.64898/2026.03.02.26346901
Top 0.3% (1.2%)
Show abstract

Epigenetic clocks estimate biological age from DNA methylation patterns at CpG sites, providing robust predictions of mortality and morbidity risk. "Blue zones"--regions of exceptional longevity--offer a unique opportunity to investigate how biological aging diverges from chronological age. However, standard clocks are typically trained on large, heterogeneous datasets, reflecting average population trends rather than region-specific dynamics. Using data from the Costa Rican Longevity and Health...

4
Novel Representations of Vaccine Protection Against Progression to Severe Disease Over Time
2026-02-14 epidemiology 10.64898/2026.02.12.26346197
Top 0.4% (0.9%)
Show abstract

BackgroundVaccines can prevent severe disease by preventing infection or by reducing progression among those who become infected. Vaccine effectiveness against progression given infection is often used to quantify this second mechanism, but it conditions on infection, which is itself affected by vaccination. As a result, this estimand lacks a clear causal interpretation and may behave non-intuitively over time. MethodsWe introduce a conceptual framework that models protection against infection ...

5
Act or Defer: Error-Controlled Decision Policies for Medical Foundation Models
2026-02-26 health informatics 10.64898/2026.02.23.26346927
Top 0.4% (0.7%)
Show abstract

Clinical deployment of foundation models requires decision policies that operate under explicit error budgets, such as a cap on false-positive clinical calls. Strong average accuracy alone does not guarantee safety: errors can concentrate among patients selected for action, leading to harm and inefficient use of healthcare resources. Here we introduce SO_SCPLOWTRATC_SCPLOWCP, a stratified conformal framework that turns foundation model predictions into decision-ready outputs through error-contro...

6
A Governance-Driven, Real-World Data-Calibrated Health Informatics Framework for Longitudinal Utilization Forecasting in Oncology and Complex Chronic Conditions
2026-02-26 health informatics 10.64898/2026.02.23.26346919
Top 0.5% (0.7%)
Show abstract

BackgroundHealthcare utilization forecasting systems are often derived from static, annualized market share assumptions that fail to represent real-world treatment dynamics. Such approaches systematically misestimate future utilization by ignoring longitudinal treatment sequencing, discontinuation with surveillance, recurrence-driven re-entry, and provider adoption dynamics. ObjectiveThis study proposes a reusable, governance-driven health informatics forecasting framework designed to generate ...

7
A Mendelian randomization-based drug repurposing pipeline
2026-03-02 epidemiology 10.64898/2026.02.28.26347341
Top 0.5% (0.7%)
Show abstract

Drug repurposing offers the opportunity to identify promising drug targets efficiently using existing data, but there are currently limitations to these efforts; there is a particular need for versatile, but rigorous high-throughput approaches. As such, we developed a flexible, high-throughput, Mendelian randomization (MR)-based drug repurposing pipeline with three stages: 1) MR-based identification, 2) MR-based validation and prioritization, and 3) application. This pipeline can be applied to a...

8
Predicting Salmonella Typhi incidence using prevalence metrics from sentinel studies of community-onset bloodstream infections
2026-02-15 public and global health 10.64898/2026.02.13.26346225
Top 0.6% (0.7%)
Show abstract

BackgroundTyphoid fever incidence estimates are central to policy decisions on vaccine introduction and investments in non-vaccine prevention and control but are often unavailable. We explored whether prevalence metrics from sentinel studies of community-onset bloodstream infections could accurately predict local Salmonella Typhi (S. Typhi) incidence. MethodsUsing a previous systematic review (January 2018-December 2024), we identified studies reporting both typhoid incidence and prevalence of ...

9
Handling onset age inconsistencies in longitudinal healthcare survey data
2026-02-23 health informatics 10.64898/2026.02.20.26346741
Top 0.8% (0.5%)
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWLongitudinal healthcare surveys frequently contain inconsistencies in self-reported onset ages, where participants report different ages for the same condition between enrollment and follow-up surveys. We propose two methods to handle this challenge. First, we introduce a procedure that aggregates inconsistency patterns to construct participant-level reliability scores, enabling researchers to stratify participants and prioritize analysis on high-reliability cohorts. Seco...

10
The Impact of MFN on Oncology and Hematology Treatments
2026-02-20 health economics 10.64898/2026.02.19.26346624
Top 0.9% (0.5%)
Show abstract

BackgroundThe Most Favored Nation (MFN) policy is a mechanism that incorporates foreign prices to determine the maximum allowable net price for any branded drug within US government-funded healthcare. Two proposed rules, the Global Benchmark for Efficient Drug Pricing ("GLOBE") (90 Fed. Reg. 60,244) for Medicare Part B and the Guarding US Medicare Against Rising Drug Costs ("GUARD") (90 Fed. Reg. 60,338) for Medicare Part D, invoke the Center for Medicare and Medicaid Innovation Centers payment ...

11
Comparison of methods for assessing effects of risk factors on disease progression in Mendelian randomization under index event bias
2026-03-02 epidemiology 10.64898/2026.02.26.26347193
Top 0.9% (0.5%)
Show abstract

Mendelian randomization has emerged as a transformative approach for inferring causal relationships between risk factors and disease outcomes. However, applying Mendelian randomization to disease progression - a critical step in validating pharmacological targets - is hampered by index event bias. This form of selection bias occurs because analyses of disease progression are necessarily restricted to individuals who have already experienced the disease event. Here, we present a comprehensive eva...

12
The Independence of Discrimination and Calibration in Clinical Risk Prediction: Lessons from a Multi-Timeframe Diabetes Prediction Framework
2026-02-14 health informatics 10.64898/2026.02.12.26346147
Top 1.0% (0.5%)
Show abstract

BackgroundClinical risk prediction models are typically evaluated by discrimination (area under the receiver operating characteristic curve, AUC), with calibration receiving less attention. We developed a multi-timeframe diabetes prediction framework emphasizing calibration and used synthetic data validation to investigate whether good discrimination guarantees good calibration. MethodsWe generated 500,000 synthetic patients using published epidemiological parameters from QDiabetes-2018, FINDRI...

13
LLM-based reconstruction of longitudinal clinical trajectories in chronic liver disease.
2026-02-10 transplantation 10.64898/2026.02.10.26345124
Top 1% (0.4%)
Show abstract

Background & AimsLiver cancer primarily develops in patients with chronic liver disease (CLD), yet most cases are diagnosed at an advanced stage with poor prognosis. While clinical surveillance of patients with CLD generates extensive longitudinal data, its unstructured free-text nature hinders large-scale research. To unlock this real-world evidence, we developed a scalable framework using open-source Large Language Models (LLMs) to transform unstructured clinical text into structured data. Me...

14
Integrating stakeholder perspectives in modeling routine data for therapeutic decision-making
2026-02-18 epidemiology 10.64898/2026.02.18.26346074
Top 1% (0.4%)
Show abstract

BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a sta...

15
Controlling for confounds in UK Biobank brain imaging data with small subsets of subjects
2026-03-03 epidemiology 10.64898/2026.03.02.26347455
Top 1% (0.4%)
Show abstract

The UK Biobank (UKB) Brain Imaging cohort contains data from almost 100,000 subjects and has yielded invaluable understanding of the links between the brain and health outcomes and lifestyles. Much of the understanding of these links has come from exploring the association between Imaging Derived Phenotypes (IDPs) and other variables that are unrelated to brain imaging, so called non-Imaging Derived Phenotypes (nIDPs). When performing analysis of this kind, it is very important to control for we...

16
Joint modelling of PSA dynamics and prostate cancer risks: A population-based study
2026-02-22 epidemiology 10.64898/2026.02.15.26346131
Top 1% (0.4%)
Show abstract

While the prostate-specific antigen (PSA) test is a widely used prostate cancer screening tool, its application remains controversial. Opportunistic PSA testing generates complex data in which testing intensities, PSA levels, and prostate cancer diagnosis are interdependent. Conventional analyses rarely model these processes jointly. The objective of this study was to develop a population-based joint model to analyse PSA dynamics, retesting patterns, and prostate cancer risk. We used the Stockho...

17
Effectiveness of new treatment modalities for localized prostate cancer through patient-reported outcome measures: 5 years comparative study.
2026-03-05 epidemiology 10.64898/2026.03.04.26347624
Top 1% (0.4%)
Show abstract

BackgroundNo randomized clinical trial comparing the most established new modalities of treatment for patients with localized prostate cancer has been published, and there is scarce comparative effectiveness research assessing Patient-Reported Outcome Measures (PROMs). Objectiveto compare the impact of active surveillance, robot-assisted radical prostatectomy (RARP), Intensity-modulated radiotherapy (IMRT), and real-time brachytherapy on patients, through PROMs, from pre-treatment to five years...

18
Methodological Guidance for Predictor Variable Selection for Adolescent Smoking Outcomes in Global Youth Tobacco Survey Using R and Python
2026-02-17 epidemiology 10.64898/2026.02.14.26346305
Top 1% (0.4%)
Show abstract

BackgroundThe Global Youth Tobacco Survey (GYTS) is widely used to monitor tobacco use among adolescents worldwide. However, inconsistent analytical approaches particularly in handling complex survey designs and predictor selection limit comparability across countries, survey waves, and software platforms. Although much of the GYTS literature relies on proprietary tools such as SAS and SPSS, practical and transparent guidance on implementing reproducible, theory-informed analyses remains limited...

19
Early health technology assessment of digital diabetes screening in Switzerland: cost-effectiveness and budget impact analyses
2026-02-11 health economics 10.64898/2026.02.10.26345992
Top 2% (0.3%)
Show abstract

ObjectivesDigital biomarkers offer scalable screening for type 2 diabetes, yet adoption is stalled by uncertainty regarding economic viability. This study evaluates the cost-effectiveness and budget impact of digital screening compared to opportunistic screening from a Swiss payer perspective. MethodsA probabilistic Markov cohort model was developed to simulate at-risk Swiss adults (age [≥]45, BMI [≥]25 kg/m{superscript 2}) over a 40-year horizon. The model incorporates a digital attritio...

20
A cost-effectiveness analysis of increased quadruple therapy use in heart failure with reduced ejection fraction in Singapore
2026-02-12 health economics 10.64898/2026.02.10.26346043
Top 2% (0.3%)
Show abstract

BackgroundQuadruple therapy, comprising an angiotensin receptor-neprilysin inhibitor (ARNI), {beta}-blocker, mineralocorticoid receptor antagonist (MRA), and sodium-glucose cotransporter 2 inhibitor (SGLT2i), is guideline-recommended for heart failure with reduced ejection fraction (HFrEF). However, uptake in Singapore remains low. This study evaluated the cost-effectiveness of scaling up quadruple therapy from the current 30% uptake to realistic (80%) and stretch (100%) targets. MethodsWe deve...